Targeted Gene Metagenomic Data Analysis ◾ 261
unweighted pair group method with arithmetic mean (UPGMA), weighted pair group
method with arithmetic mean (WPGMA), and neighbor joining (NJ) [14].
Both UPGMA and WPGMA assume a randomized molecular clock that measures the
evolutionary divergence of sequences. The molecular clock is defined as the average rate at
which a sequence accumulates mutations. Both UPGMA and WPGMA also have a similar
algorithm. They use a cluster procedure that assumes each representative sequence as a
cluster on its own and then they join the closest clusters and recalculate the distance of the
joint pair by the average. These steps are repeated until all sequences are connected in a
single cluster. However, the difference between the two methods is that in UPGMA, equal
weight is assigned on the distances, while in the WPGMA different weights are assigned
on the distances.
The algorithm of the NJ method does not make an assumption of the molecular clock
and it adjusts for the rate variation among branches. The algorithm begins with an initial
unsolved star-like tree made up of the representative sequences. The distance between each
pair is evaluated. The first joint is created by joining the closest two neighboring sequences
and a branch is inserted between them and the rest of the star-like tree. The value of the
branch is recalculated on the basis of their average distance. This process is repeated until
only one terminal is present from the initial tree.
The above briefly described tree construction methods are distance-based and less
computationally expensive. However, there are other methods including maximum par-
simony (MP) and maximum likelihood (ML) which make use of all known evolutionary
information (individual substitutions) to determine the most likely ancestral relationships.
Refer to a book for phylogenetic tree for more details about the various tree construction
methods.
A phylogenetic tree is either rooted (with a common ancestor for all sequences) or
unrooted (without common ancestor). The unrooted trees are constructed when we do not
make the assumption that the molecular clock evolution is valid and they only reflect the
relationship among representative sequences but not the evolutionary path. However, if we
can make the assumption that sequences evolve at rates that remain constant through time
for different lineages, then the root of a tree is estimated as the midpoint of the longest span
across the tree.
7.2.5 Microbial Diversity Analysis
The microbial diversity or richness is calculated from the feature table, obtained in the
denoising step above, to describe the number of different species of microbes present
within individual samples and between samples. The diversity of the microbial community
within a sample is called alpha diversity, while the measure of similarity or dissimilarity
of microbial communities in two samples is called beta diversity. For alpha diversity, there
are several diversity metrics including Shannon’s diversity index, observed features, Faith’s
phylogenetic diversity, and evenness. The beta diversity metrics include Jaccard distance,
Bray–Curtis distance, and unweighted UniFrac distance.